ICLR2023推荐系统投稿论文集锦
通过对本次ICLR上关于推荐系统相关论文的总结,发现所涉及的研究方向比较广泛,比如基于transformer的推荐[1],基于强化学习的推荐[2,21],图对比学习推荐[3],联邦推荐[5, 18],可解释推荐[6,7],去偏推荐[8,12],多行为推荐[10],鲁棒推荐算法[8,13],序列推荐[17],CTR预估[22]等。
更多ICLR论文可移步下文链接。
https://openreview.net/group?id=ICLR.cc/2023/Conference#all-submissions
以下整理了论文标题、评分、链接以及摘要,如感兴趣可移步原文精读。
1. Recommender Transformers with Behavior Pathways 2. Deep Evidential Reinforcement Learning for Dynamic Recommendations 3. Simple Yet Effective Graph Contrastive Learning for Recommendation 4. IEDR: A Context-aware Intrinsic and Extrinsic Disentangled Recommender System 5. Communication Efficient Fair Federated Recommender System 6. Explainable Recommender with Geometric Information Bottleneck 7. TGP: Explainable Temporal Graph Neural Networks for Personalized Recommendation 8. TDR-CL: Targeted Doubly Robust Collaborative Learning for Debiased Recommendations 9. Where to Go Next for Recommender Systems? ID- vs. Modality-based recommender models revisited 10. Multi-Behavior Dynamic Contrastive Learning for Recommendation 11. Inverse Learning with Extremely Sparse Feedback for Recommendation 12. Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems 13. StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random 14. Personalized Reward Learning with Interaction-Grounded Learning (IGL) 15. Knowledge-Driven New Drug Recommendation 16. Has it really improved? Knowledge graph based separation and fusion for recommendation 17. ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor 18. Dual personalization for federated recommendation on devices 19. Everyone's Preference Changes Differently: Weighted Multi-Interest Retrieval Model 20. Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training 21. Neural Collaborative Filtering Bandits via Meta Learning 22. MaskFusion: Feature Augmentation for Click-Through Rate Prediction via Input-adaptive Mask Fusion 23. Consistent Data Distribution Sampling for Large-scale Retrieval 24. Clustering Embedding Tables, Without First Learning Them
1. Recommender Transformers with Behavior Pathways
Ratings: 5, 6, 5
https://openreview.net/forum?id=YsdscENWse9
Sequential recommendation requires the recommender to capture the evolving behavior characteristics from logged user behavior data for accurate recommendations. Nevertheless, user behavior sequences are viewed as a script with multiple ongoing threads intertwined. We find that only a small set of pivotal behaviors can be evolved into the user's future action. As a result, the future behavior of the user is hard to predict. We conclude this characteristic for sequential behaviors of each user as the behavior pathway. Different users have their unique behavior pathways. Among existing sequential models, transformers have shown great capacity in capturing global-dependent characteristics. However, these models mainly provide a dense distribution over all previous behaviors using the self-attention mechanism, making the final predictions overwhelmed by the trivial behaviors not adjusted to each user. In this paper, we build the Recommender Transformer (RETR) with a novel Pathway Attention mechanism. RETR can dynamically plan the behavior pathway specified for each user, and sparingly activate the network through this behavior pathway to effectively capture evolving patterns useful for recommendation. The key design is a learned binary route to prevent the behavior pathway from being overwhelmed by trivial behaviors. Pathway attention is model-agnostic and can be applied to a series of transformer-based models for sequential recommendation. We empirically evaluate RETR on seven intra-domain benchmarks and RETR yields state-of-the-art performance. On another five cross-domain benchmarks, RETR can capture more domain-invariant representations for sequential recommendation.
2. Deep Evidential Reinforcement Learning for Dynamic Recommendations
Ratings: 5, 8, 3
https://openreview.net/forum?id=eoUsOflG7QD
Reinforcement learning (RL) has been applied to build recommender systems (RS) to capture users' evolving preferences and continuously improve the quality of recommendations. In this paper, we propose a novel deep evidential reinforcement learning (DERL) framework that learns a more effective recommendation policy by integrating both the expected reward and evidence-based uncertainty. In particular, DERL conducts evidence-aware exploration to locate items that a user will most likely take interest in the future. Two central components of DERL include a customized recurrent neural network (RNN) and an evidential-actor-critic (EAC) module. The former module is responsible for generating the current state of the environment by aggregating historical information and a sliding window that contains the current user interactions as well as newly recommended items that may encode future interest. The latter module performs evidence-based exploration by maximizing a uniquely designed evidential Q-value to derive a policy giving preference to items with good predicted ratings while remaining largely unknown to the system (due to lack of evidence). These two components are jointly trained by supervised learning and reinforcement learning. Experiments on multiple real-world dynamic datasets demonstrate the state-of-the-art performance of DERL and its capability to capture long-term user interests.
3. Simple Yet Effective Graph Contrastive Learning for Recommendation
Ratings: 8, 5, 5
https://openreview.net/forum?id=FKXVK9dyMM
Graph neural network (GNN) is a powerful learning approach for graph-based recommender systems. Recently, GNN intergrated with contrastive learning has shown superior performance with data augmentation for recommendation, with the aim of dealing with highly sparse data. Despite their success, most existing graph contrastive learning methods either perform stochastic augmentation (e.g., node/edge perturbation) on the user-item interaction graph, or rely on the heuristic-based augmentation techniques (e.g., user clustering) for generating contrastive views. We argue that these methods cannot well preserve the intrinsic semantic structures and are easily biased by the noise perturbation. In this paper, we propose a simple yet effective graph contrastive learning paradigm LightGCL that mitigates these issues that negatively impact the generality and robustness of CL-based recommenders. Our model exclusively utilizes singular value decomposition for contrastive augmentation, which enables the unconstrained structure refinement with global collaborative relation modeling. Experiments conducted on several benchmark datasets demonstrate that our method significantly improves the performance over state-of-the-arts. Further analyses show the superiority of LightGCL's robustness against data sparsity and popularity bias. The source code of our model is available at https://anonymous.4open.science/r/LightGCL/.
4. IEDR: A Context-aware Intrinsic and Extrinsic Disentangled Recommender System
Ratings: 6, 3, 6
https://openreview.net/forum?id=S2N25rUM55l
Intrinsic and extrinsic factors jointly affect users' decisions in item selection (e.g., click, purchase). Intrinsic factors reveal users' real interests and are invariant in different contexts (e.g., time, weather), whereas extrinsic factors can change w.r.t. different contexts. Analyzing these two factors is an essential yet challenging task in recommender systems. However, in existing studies, factor analysis is either largely neglected, or designed for a specific context (e.g., the time context in sequential recommendation), which limits the applicability of such models. In this paper, we propose a generic model, IEDR, to learn intrinsic and extrinsic factors from various contexts for recommendation. IEDR contains two key components: a contrastive learning component, and a disentangling component. The two components collaboratively enable our model to learn context-invariant intrinsic factors and context-based extrinsic factors from all available contexts. Experimental results on real-world datasets demonstrate the effectiveness of our model in factor learning and impart a significant improvement in recommendation accuracy over the state-of-the-art methods.
5. Communication Efficient Fair Federated Recommender System
Ratings: 6, 6, 3, 5
https://openreview.net/forum?id=ZLv-8v0Sp_H
Federated Recommender Systems (FRSs) aim to provide recommendations to clients in a distributed manner with privacy preservation. FRSs suffer from high communication costs due to the communication between the server and many clients. Some past literature on federated supervised learning shows that sampling clients randomly improve communication efficiency without jeopardizing accuracy. However, each user is considered a separate client in FRS and clients communicate only item gradients. Thus, incorporating random sampling and determining the number of clients to be sampled in each communication round to retain the model's accuracy in FRS becomes challenging. This paper provides sample complexity bounds on the number of clients that must be sampled in an FRS to preserve accuracy. Next, we consider the issue of demographic bias in FRS, quantified as the difference in the average error rates across different groups. Supervised learning algorithms mitigate the group bias by adding the fairness constraint in the training loss, which requires sharing protected attributes with the server. This is prohibited in a federated setting to ensure clients' privacy. We design RS-FairFRS, a Random Sampling based Fair Federated Recommender System, which trains to achieve a fair global model. In addition, it also trains local clients towards a fair global model to reduce demographic bias at the client level without the need to share their protected attributes. We empirically demonstrate all our results across the two most popular real-world datasets (ML1M, ML100k) and different sensitive features (age and gender) to prove that RS-FairFRS helps reduce communication cost and demographic bias with improved model accuracy.
6. Explainable Recommender with Geometric Information Bottleneck
Ratings: 5, 5, 5
https://openreview.net/forum?id=I_IJf5oDRo
Explainable recommender systems have attracted much interest in recent years as they can explain their recommendation decisions, enhancing user trust in the systems. Most explainable recommender systems rely on human-generated rationales or annotated aspect features from user reviews to train models for rational generation or extraction. The rationales produced are often confined to a single review. To avoid the expensive human annotation process and to generate explanations beyond individual reviews, we propose an explainable recommender system trained on user reviews by developing a transferable Geometric Information Bottleneck (GIANT), which leverages the prior knowledge acquired through clustering on a user-item graph built on user-item rating interactions, since graph nodes in the same cluster tend to share common characteristics or preferences. We then feed user reviews and item reviews into a variational network to learn latent topic distributions which are regularised by the distributions of user/item estimated based on their distances to various cluster centroids of the user-item graph. By iteratively refining the instance-level review latent topics with GIANT, our method learns a robust latent space from the text for rating prediction and explanation generation. Experimental results on three e-commerce datasets show that our model significantly improves the interpretability of a variational recommender using a standard Gaussian prior, in terms of coherence, diversity and faithfulness, while achieving performance comparable to existing content-based recommender systems in terms of rating prediction accuracy.
7. TGP: Explainable Temporal Graph Neural Networks for Personalized Recommendation
Ratings: 5, 5, 5
https://openreview.net/forum?id=EGobBwPc1J-
The majority of item retrieval algorithms in typical "retrieval-rank-rerank" structured recommendation systems can be separated into three categories: deep latent, sequential and graph-based recommenders, which collect collaborative-filtering, sequential and homogeneous signals respectively. However, there is a conceptual overlap between sequential and graph recommenders on a user's past interacted items. It triggers an idea that the sequential, collaborative-filtering and homegeneous signals can be included in one temporal graph formatted data structure, and the sequential, latent and graph learning algorithms can be summarized as one temporal graph encoder. In this paper, Temporal Graph Plugin is proposed as a such explainable temporal graph encoder to supplement deep latent algorithms with aggregated k-hop temporal neighborhood message via a local attention module. We conduct extensive experiments on two public datasets Reddit and Wikipedia, where TGP exceeds SOTA sequential, latent, graph algorithms by 1.1%, 52.8% and 98.9% respectively, partially verifying the proposed hypothesis. Codes will be made public upon receival.
8. TDR-CL: Targeted Doubly Robust Collaborative Learning for Debiased Recommendations
Ratings: 6, 8, 6
https://openreview.net/forum?id=EIgLnNx_lC
Bias is a common problem inherent in recommender systems, which is entangled with users' preferences and poses a great challenge to unbiased learning. For debiasing tasks, the doubly robust (DR) method and its variants show superior performance due to the double robustness property, that is, DR is unbiased when either imputed errors or learned propensities are accurate. However, our theoretical analysis reveals that DR usually has a large variance. Meanwhile, DR would suffer unexpectedly large bias and poor generalization caused by inaccurate imputed errors and learned propensities, which often occur in practice. In this paper, we propose a principled approach that can effectively reduce the bias and variance simultaneously for existing DR estimators when the error-imputation model is misspecified. In addition, we further propose a novel semi-parametric collaborative counterfactual learning approach that decomposes imputed errors into parametric and nonparametric parts and updates them collaboratively, resulting in more accurate predictions. Both theoretical analysis and experiments demonstrate the superiority of the proposed methods compared with existing debiasing methods.
9. Where to Go Next for Recommender Systems? ID- vs. Modality-based recommender models revisited
Ratings: 5, 5, 5, 8, 3
https://openreview.net/forum?id=bz3MAU-RhnW
Recommender models that utilize unique identities (IDs for short) to represent distinct users and items have been the state-of-the-arts and dominating the recommender system (RS) literature for over a decade. In parallel, the pre-trained modality encoders, such as BERT and ResNet, are becoming increasingly powerful in modeling raw modality features, e.g., text and images. In light of this, a natural question arises: whether the modality (a.k.a, content) only based recommender models (MoRec) can exceed or be on par with the ID-only based models (IDRec) when item modality features are available? In fact, this question had been answered once a decade ago, when IDRec beat MoRec with strong advantages in terms of both recommendation accuracy and efficiency. We aim to revisit this ''old'' question and systematically study MoRec from several aspects. Specifically, we study several sub-questions: (i) which recommender paradigm, MoRec or IDRec, performs best in various practical scenarios, including regular, cold and new item scenarios? does this hold for items with different modality features? (ii) will MoRec benefit from the latest technical advances in corresponding communities, for example, natural language processing and computer vision? (iii) what is an effective way to leverage item modality representations, freezing them or adapting them by fine-tuning on new data? (iv) are there any other factors that affect the efficacy of MoRec. To answer these questions, we conduct rigorous experiments for item recommendations with two popular modalities, i.e., text and vision. We provide empirical evidence that MoRec with standard end-to-end training is highly competitive and even exceeds IDRec in some cases. Many of our observations imply that the dominance of IDRec in terms of recommendation accuracy does not hold well when items' raw modality features are available. We promise to release all related codes & datasets upon acceptance.
10. Multi-Behavior Dynamic Contrastive Learning for Recommendation
Ratings: 6, 5, 5, 8
https://openreview.net/forum?id=ykOpK9O5qYv
Dynamic behavior modeling has become an essential task in personalized recommender systems for learning the time-evolving user preference in online platforms. However, most next-item recommendation methods follow the single type behavior learning manner, which notably limits their user representation performance in reality, since the user-item relationships are often multi-typed in real-life applications (e.g., click, tag-as-favorite, review and purchase). To offer better recommendations, this work proposes Evolving Graph Contrastive Memory Network (EGCM) to model dynamic interaction heterogeneity for multi-behavior sequential recommendation. Specifically, we first develop a multi-behavior graph encoder to capture the short-term preference heterogeneity, and preserve the dedicated relation semantics for different types of user-item interactions. In addition, we design a dynamic cross-relational memory network, empowering EGCM to distill the long-term multi-behavior preference of users and the underlying evolving cross-type behavior dependencies over time. To enhance the user representation with multi-behavior commonality and diversity, we design a multi-behavior contrastive learning paradigm with heterogeneous short- and long-term interest modeling. Experiments on several real-world datasets show the superiority of our recommender system over various state-of-the-art baselines.
11. Inverse Learning with Extremely Sparse Feedback for Recommendation
Ratings: 5, 3, 5
https://openreview.net/forum?id=_izzMPiE1y
Negative sampling is widely used in modern recommender systems, where negative data is randomly sampled from the whole item pool. However, such a strategy often introduces false-positive noises. Existing approaches about de-noising recommendation mainly focus on positive instances while ignoring the noise in the large amount of sampled negative feedback. In this paper, we propose a meta learning method to annotate the unlabeled data from loss and gradient perspectives, which considers the noises on both positive and negative instances. Specifically, we first propose inverse dual loss(IDL) to boost the true label learning and prevent the false label learning, based on the loss of unlabeled data towards true and false labels during the training process. To achieve more robust sampling on hard instances, we further propose inverse gradient(IG) to explore the correct updating gradient and adjust the updating based on meta learning. We conduct extensive experiments on a benchmark and an industrially collected dataset where our proposed method can significantly improve AUC by 9.25% against state-of-the-art methods. Further analysis verifies the proposed inverse learning is model-agnostic and can well annotate the labels combined with different recommendation backbones. The source code along with the best hyper-parameter settings is available at this link:
https://anonymous.4open.science/r/InverseLearning-4F4F.
12. Calibration Matters: Tackling Maximization Bias in Large-scale Advertising Recommendation Systems
Ratings: 6, 6, 6, 8
https://openreview.net/forum?id=wzlWiO_WY4
Calibration is defined as the ratio of the average predicted click rate to the true click rate. The optimization of calibration is essential to many online advertising recommendation systems because it directly affects the downstream bids in ads auctions and the amount of money charged to advertisers. Despite its importance, calibration often suffers from a problem called “maximization bias”. Maximization bias refers to the phenomenon that the maximum of predicted values overestimates the true maximum. The problem is introduced because the calibration is computed on the set selected by the prediction model itself. It persists even if unbiased predictions are achieved on every datapoint and worsens when covariate shifts exist between the training and test sets. To mitigate this problem, we quantify maximization bias and propose a variance-adjusting debiasing (VAD) meta-algorithm in this paper. The algorithm is efficient, robust, and practical as it is able to mitigate maximization bias problem under covariate shifts, without incurring additional online serving costs or compromising the ranking performance. We demonstrate the effectiveness of the proposed algorithm using a state-of-the-art recommendation neural network model on a large-scale real-world dataset.
13. StableDR: Stabilized Doubly Robust Learning for Recommendation on Data Missing Not at Random
Ratings: 8, 5, 6
https://openreview.net/forum?id=3VO1y5N7K1H
In recommender systems, users always choose the favorite items to rate, which leads to data missing not at random and poses a great challenge for unbiased evaluation and learning of prediction models. Currently, the doubly robust (DR) methods have been widely studied and demonstrate superior performance. However, in this paper, we show that DR methods are unstable and have unbounded bias, variance, and generalization bounds to extremely small propensities. Moreover, the fact that DR relies more on extrapolation will lead to suboptimal performance. To address the above limitations while retaining double robustness, we propose a stabilized doubly robust (StableDR) learning approach with a weaker reliance on extrapolation. Theoretical analysis shows that StableDR has bounded bias, variance, and generalization error bound simultaneously under inaccurate imputed errors and arbitrarily small propensities. In addition, we propose a novel learning approach for StableDR that updates the imputation, propensity, and prediction models cyclically, achieving more stable and accurate predictions. Extensive experiments show that our approaches significantly outperform the existing methods.
14. Personalized Reward Learning with Interaction-Grounded Learning (IGL)
Ratings: 6, 5, 6
https://openreview.net/forum?id=wGvzQWFyUB
In an era of countless content offerings, recommender systems alleviate information overload by providing users with personalized content suggestions. Due to the scarcity of explicit user feedback, modern recommender systems typically optimize for a fixed combination of implicit feedback signals across all users. However, this approach disregards a growing body of work that (i) implicit signals can be used by users in diverse ways, signaling anything from satisfaction to active dislike, and (ii) different users communicate preferences in different ways. We propose applying the recent Interaction Grounded Learning (IGL) paradigm to address the challenge of learning representations of diverse user communication modalities. Rather than taking a fixed, human-designed reward function, IGL is able to learn personalized reward functions for different users and then optimize directly for the latent user satisfaction. We demonstrate the success of IGL with experiments using simulations as well as with real-world production traces.
15. Knowledge-Driven New Drug Recommendation
Ratings: 3, 5, 5, 3
https://openreview.net/forum?id=83xscrmnw6Q
Drug recommendation assists doctors in prescribing personalized medications to patients based on their health conditions. Existing drug recommendation solutions adopt the supervised multi-label classification setup and only work with existing drugs with sufficient prescription data from many patients. However, newly approved drugs do not have much historical prescription data and cannot leverage existing drug recommendation methods. To address this, we formulate the new drug recommendation as a few-shot learning problem. Yet, directly applying existing few-shot learning algorithms faces two challenges: (1) complex relations among diseases and drugs and (2) numerous false-negative patients who were eligible but did not yet use the new drugs. To tackle these challenges, we propose EDGE, which can quickly adapt to the recommendation for a new drug with limited prescription data from a few support patients. EDGE maintains a drug-dependent multi-phenotype few-shot learner to bridge the gap between existing and new drugs. Specifically, EDGE leverages the drug ontology to link new drugs to existing drugs with similar treatment effects and learns ontology-based drug representations. Such drug representations are used to customize the metric space of the phenotype-driven patient representations, which are composed of a set of phenotypes capturing complex patient health status. Lastly, EDGE eliminates the false-negative supervision signal using an external drug-disease knowledge base. We evaluate EDGE on two real-world datasets: the public EHR data (MIMIC-IV) and private industrial claims data. Results show that EDGE achieves 7.3% improvement on the ROC-AUC score over the best baseline.
16. Has it really improved? Knowledge graph based separation and fusion for recommendation
Ratings: 3, 3, 3
https://openreview.net/forum?id=Su04-8n0ia4
In this paper we study the knowledge graph (KG) based recommendation systems. We first design the metric to study the relationship between different SOTA models and find that the current recommendation systems based on knowledge graph have poor ability to retain collaborative filtering signals, and higher-order connectivity would introduce noises. In addition, we explore the collaborative filtering recommendation method using GNN and design the experiment to show that the information learned between GNN models stacked with different layers is different, which provides the explanation for the unstable performance of GNN stacking different layers from a new perspective. According to the above findings, we first design the model-agnostic Cross-Layer Fusion Mechanism without any parameters to improve the performance of GNN. Experimental results on three datasets for collaborative filtering show that Cross-Layer Fusion Mechanism is effective for improving GNN performance. Then we design three independent signal extractors to mine the data at three different perspectives and train them separately. Finally, we use the signal fusion mechanism to fuse different signals. Experimental results on three datasets that introduce KG show that our KGSF achieves significant improvements over current SOTA KG based recommendation methods and the results are interpretable.
17. ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor
Ratings: 5, 8, 8, 8
https://openreview.net/forum?id=HmPOzJQhbwg
Long-term engagement is preferred over immediate engagement in sequential recommendation as it directly affects product operational metrics such as daily active users (DAUs) and dwell time. Meanwhile, reinforcement learning (RL) is widely regarded as a promising framework for optimizing long-term engagement in sequential recommendation. However, due to expensive online interactions, it is very difficult for RL algorithms to perform state-action value estimation, exploration and feature extraction when optimizing long-term engagement. In this paper, we propose ResAct which seeks a policy that is close to, but better than, the online-serving policy. In this way, we can collect sufficient data near the learned policy so that state-action values can be properly estimated, and there is no need to perform online exploration. ResAct optimizes the policy by first reconstructing the online behaviors and then improving it via a Residual Actor. To extract long-term information, ResAct utilizes two information-theoretical regularizers to confirm the expressiveness and conciseness of features. We conduct experiments on a benchmark dataset and a large-scale industrial dataset which consists of tens of millions of recommendation requests. Experimental results show that our method significantly outperforms the state-of-the-art baselines in various long-term engagement optimization tasks.
18. Dual personalization for federated recommendation on devices
Ratings: 5, 6, 3, 6
https://openreview.net/forum?id=8VvQ4SpvZVi
Federated recommendation is a new Internet service architecture that aims to provide privacy-preserving recommendation services in federated settings. Existing solutions are used to combine distributed recommendation algorithms and privacy-preserving mechanisms. Thus it inherently takes the form of heavyweight models at the server and hinders the deployment of on-device intelligent models to end-users. This paper proposes a novel Personalized Federated Recommendation (PFedRec) framework to learn many user-specific lightweight models to be deployed on smart devices rather than a heavyweight model on a server. Moreover, we propose a new dual personalization mechanism to effectively learn fine-grained personalization on both users and items. The overall learning process is formulated into a unified federated optimization framework. Specifically, unlike previous methods that share exactly the same item embeddings across users in a federated system, dual personalization allows mild finetuning of item embeddings for each user to generate user-specific views for item representations which can be integrated into existing federated recommendation methods to gain improvements immediately. Experiments on multiple benchmark datasets have demonstrated the effectiveness of PFedRec and the dual personalization mechanism. Moreover, we provide visualizations and in-depth analysis of the personalization techniques in item embedding, which shed novel insights on the design of RecSys in federated settings.
19. Everyone's Preference Changes Differently: Weighted Multi-Interest Retrieval Model
Ratings: 3, 8, 5, 6
https://openreview.net/forum?id=usa87QW3_r9
User embeddings (vectorized representations of a user) are essential in recommendation systems. Numerous approaches have been proposed to construct a representation for the user in order to find similar items for retrieval tasks, and they have been proven effective in industrial recommendation systems. Recently people have discovered the power of using multiple embeddings to represent a user, with the hope that each embedding represents the user's interest in a certain topic. With multi-interest representation, it's important to model the user's preference over the different topics and how the preference change with time. However, existing approaches either fail to estimate the user's affinity to each interest or unreasonably assume every interest of every user fades with an equal rate with time, thus hurting the performance of candidate retrieval. In this paper, we propose the Multi-Interest Preference (MIP) model, an approach that not only produces multi-interest for users by using the user's sequential engagement more effectively but also automatically learns a set of weights to represent the preference over each embedding so that the candidates can be retrieved from each interest proportionally. Extensive experiments have been done on various industrial-scale datasets to demonstrate the effectiveness of our approach.
20. Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training
Ratings: 3, 3, 3, 6
https://openreview.net/forum?id=sWSWudSpYy
Modern DNN-based recommendation systems rely on training-derived real-valued embeddings of sparse categorical features. Input sparsity makes obtaining high-quality embeddings for rarely-occurring categories harder as their representations are updated infrequently. We demonstrate an effective overparameterization technique for enhancing embeddings training by enabling useful cross-category learning. Our scheme trains embeddings using training-time forced factorization of the embedding (linear) layer, with an inner dimension higher than the target embedding dimension.
We show that factorization breaks update sparsity via non-homogeneneous weighting of dense base embedding matrices. Such weighting controls the magnitude of weight updates in each embedding direction, and is adaptive to training-time embedding singular values. The dynamics of singular values further explains the puzzling importance of factorization inner dimension on learning enhancements.
We call the scheme multi-layer embeddings training (MLET). For deployment efficiency, MLET converts the trained two-layer embedding into a single-layer one at the conclusion of training, avoiding inference-time model size increase. MLET consistently produces better models when tested on multiple recommendation models for click-through rate (CTR) prediction. At constant model quality, MLET allows embedding dimension reduction by up to 16x, and 5.8x on average, across the models. MLET retains its benefits in combination with other table reduction methods (hashing and quantization).
21. Neural Collaborative Filtering Bandits via Meta Learning
Ratings: 3, 5, 5, 8
https://openreview.net/forum?id=15hYIH0TUi
Contextual multi-armed bandits provide powerful tools to solve the exploitation-exploration dilemma in decision making, with direct applications in the personalized recommendation. In fact, collaborative effects among users carry the significant potential to improve the recommendation. In this paper, we introduce and study the problem by exploring 'Neural Collaborative Filtering Bandits', where the rewards can be non-linear functions and groups are formed dynamically given different specific contents. To solve this problem, we propose a meta-learning based bandit algorithm, Meta-Ban (\textbf{meta-ban}dits), where a meta-learner is designed to represent and rapidly adapt to dynamic groups, along with an informative UCB-based exploration strategy. Furthermore, we analyze that Meta-Ban can achieve the regret bound of O(nTlogT^0.5), which is sharper over state-of-the-art related works. In the end, we conduct extensive experiments showing that Meta-Ban outperforms six strong baselines.
22. MaskFusion: Feature Augmentation for Click-Through Rate Prediction via Input-adaptive Mask Fusion
Ratings: 3, 5, 5, 8
https://openreview.net/forum?id=QzbKH8nNq_V
Click-through rate (CTR) prediction plays important role in the advertisement, recommendation, and retrieval applications. Given the feature set, how to fully utilize the information from the feature set is an active topic in deep CTR model designs. There are several existing deep CTR works focusing on feature interactions, feature attentions, and so on. They attempt to capture high-order feature interactions to enhance the generalization ability of deep CTR models. However, these works either suffer from poor high-order feature interaction modeling using DNN or ignore the balance between generalization and memorization during the recommendation. To mitigate these problems, we propose an adaptive feature fusion framework called MaskFusion, to additionally capture the explicit interactions between the input feature and the existing deep part structure of deep CTR models dynamically, besides the common feature interactions proposed in existing works. MaskFusion is an instance-aware feature augmentation method, which makes deep CTR models more personalized by assigning each feature with an instance-adaptive mask and fusing each feature with each hidden state vector in the deep part structure. MaskFusion can also be integrated into any existing deep CTR models flexibly. MaskFusion achieves state-of-the-art (SOTA) performance on all seven benchmarks deep CTR models with three public datasets.
23. Consistent Data Distribution Sampling for Large-scale Retrieval
Ratings: 3, 5, 5, 3
https://openreview.net/forum?id=NUU2tFxUjRa
Retrieving candidate items with low latency and computational cost is important for large-scale advertising systems. Negative sampling is a general approach to model million-scale items with rich features in the retrieval. The training-inference inconsistency of data distribution brought from sampling negatives is a key challenge. In this work, we propose a novel negative sampling strategy Consistent Data Distribution Sampling (CDDS) to solve such an issue. Specifically, we employ a relative large-scale of uniform training negatives and batch negatives to adequately train long-tail and hot items respectively, and employ high divergence negatives to improve the learning convergence. To make the above training samples approximate the serving item data distribution, we introduce an auxiliary loss based on an asynchronous item embedding matrix over the entire item pool. Offline experiments on real datasets achieve SOTA performance. Online experiments with multiple advertising scenarios show that our method has achieved significant increases in GMV. The source code will be released in the future.
24. Clustering Embedding Tables, Without First Learning Them
Ratings: null
https://openreview.net/forum?id=T-DKAYt6BMk
Machine learning systems use embedding tables to work with categorical features. These tables may get extremely large in modern recommendation systems, and various methods have been suggested to fit them in memory.
Product- and Residual Vector Quantization are some of the most successful methods for table compression. They function by substituting table rows with references to ''codewords'' picked by k-means clustering. Unfortunately, this means that they must first know the table before compressing it, thus they can only save memory at inference time, not training time. Recent work has employed hashing-based approaches to minimize memory usage during training, however the compression obtained is poorer than that achieved by ''post-training'' quantization.
We demonstrate that combining hashing and clustering based algorithms provides the best of both worlds. By first training a hashing-based ''sketch'', then clustering it, and then training the clustered quantization, our method may achieve compression ratios close to those of post-training quantization with the training time memory reductions of hashing-based methods. We prove that this technique works rigorously in the least-square setting.
更多推荐
EMNLP 2022 | What and how? 模型的学习内容和推理方式探究
ICLR 2023 (投稿)| 扩散模型相关论文分类整理
Huge and Efficient! 一文了解大规模预训练模型搞高效训练技术